APPARATUS AND METHOD TO COORDINATO MULTIPLE DATA STORAGE 

AND RETRIEVAL SYSTEMS 
Field Of The Invention 

TWs invention relates to an apparatus and method to coordinate multiple data 
storage and retrieval systems, hi certain embodiments, the invention relates to an 
apparatus and method to ensure sequential data consistency in multiple data storage and 
retrieval systems. 

Background Of The Invention 

Many data processing systems require a large amount of data storage, for use in 
efficiently accessing, modifying, and re-storing data. Data storage is typically separated 
into several different levels, each level exhibiting a different data access time or data 
storage cost. A first, or highest level of data storage involves electronic memory, usually 
dynamic or static random access memory (DRAM or SRAM). Electronic memories take 
the form of semiconductor integrated circuits where millions of bytes of data can be 
stored on each circuit, with access to such bytes of data measured in nanoseconds. Tlie 
electronic memory provides the fastest access to data since access is entirely electronic. 

A second level of data storage usually involves direct access storage devices 
(DASD). DASD storage, for example, includes magnetic and/or optical disks. Data bits 
are stored as micrometer-sized or less magnetically or optically altered spots on a disk 
surfece, representing the "ones" and "zeros" that comprise tiie binary value of the data 
bits. Magnetic DASD includes one or more disks that are coated with remnant magnetic 
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material. DASDs can store gigabytes of data, and the access to such data is typically 
measured in milliseconds, i.e. orders of magnitudes slower than electronic memory. 

Having a backup data copy is mandatory for many businesses for which data loss 
would be catastrophic. The time required to recover lost data is also an important 
5 recovery consideration. With tape or library backup, primary data is periodically backed- 
up by making a copy on tape or library storage at a remote storage location. 

Data disaster recovery solutions include peer-to-peer copy where data is backed- 
up not only remotely, but also continuously, either synchronously or asynchronously. 
Using such a peer-to-peer network, the secondary data must be "order consistenC that is, 
10 secondary data is copied in the same sequential order as the primary data, i.e. sequential 
consistency. Without sequential consistency, inconsistent secondary data would result, 
thm corrupting disaster recovery. 

What is needed is a method to coordinate multiple data storage and retrieval 
systems. More particularly, what is needed is a method to ensure the sequential 
15 consistency of data stored in those multiple data storage and retrieval systems. 

Summary Of The Invention 
Applicants' invention includes a method to coordinate interconnected information 
storage and retrieval systems, where each of the information and storage systems is 
capable of communicating with one or more host computers. Applicants' method 
20 provides a plurality of controllers, where at least one of those plurality of controllers is 
disposed in each of the information storage and retrieval systems. 
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Applicants' method designates one of the plurality of controllers as a master 
controller and the remaining controllers as target controllers, generates one or more 
master controller commands bythat master controller,and provides those o^ 

master controller commands to each of the target controllers, where the one or more 
5 master controller commands cause each of those target controllers to adjust the flow of 
data into and out of each of the information storage and retrieval systems. 

Brief Description Of The Drawings 
The invention will bebetter understood from areading of the following detailed 
description taken in conjunction with the drawings in which like reference designators are 
10 used to designate like elements, and in which: 

FIG. 1 isablock diagram showing the components of Applicants' datastorage 

and retrieval system; 

FIG. 2 is a flow chart summarizing the steps in Applicants' method; 

FIG. 3 is a block diagram showing three intercomiected data storage and retrieval 

1 5 system and a host computer; 

FIG. 4 is a block diagram showing the three data storage and retrieval systems 
and host computer of FIG. 3 intercomiected to three remote storage locations; 

Detailed Description Of The Preferred Embodiments 
Referring to the illustrations, like numerals correspond to like parts depicted in 
20 the Figures. The invention will be described as embodied in a system comprising 

multiple information storage and retrieval systems. In certain embodiments, one or mor 
of Applicants' information storage and retrieval systems comprises two or more 
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subsystems sometimes referred to as "clusters." In certain embodiments, one or more of 
Applicants' information and storage retrieval systems do not include individual clusters. 

Referring now to FIG. 1, Applicants' information storage and retrieval system 
100 includes a first subsystem 101 A and a second subsystem lOlB. Each subsystem 
includes a processor portion 130 / 140 and an input/output portion 160 / 170. Mtemal 
PCI buses in each subsystem are connected via a Remote 1/0 bridge 155/ 165 between 
the processor portions 130 / 140 and I/O portions 160 / 170, respectively. 

Information storage and retrieval system 100 further includes a plurality of input / 
output ("I/O") adapters 102 - 105, 107 - 1 10, 1 12 - 1 15, and 1 17 - 120, disposed in four 
bays 101, 106, 1 1 1, and 1 16. Each I/O adapter may comprise one Fibre Channel port, 
one FICON port, two ESCON ports, or two SCSI ports. Each I/O adapter is comiected to 
both subsystems through one or more Common Platform Intercomiect buses 121 and 150 

such that each subsystem can handle I/O from any I/O adapter. 

Processor portion 130 includes processor 132 and cache 134. In certain 
embodiments, processor 132 comprises a 64-bit RISC based symmetric multiprocessor, 
hi certam embodiments, processor 132 includes built-in fault and enor-correction 
functions. Cache 134 is used to store both read and write data to improve performance to 
the attached host systems, hi certain embodiments, cache 134 comprises about 4 
gigabytes, hi certain embodiments, cache 134 comprises about 8 gigabytes, hi certain 
embodiments, cache 134 comprises about 12 gigabytes, hi certain embodiments, cache 

144 comprises about 16 gigabytes, hi certam embodunents, cache 134 comprises about 

32 gigabytes. 
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Processor portion 140 includes processor 142 and cache 144. In certain 
embodiments, processor 142 comprises a 64.bit RISC based symmetric multiprocessor. 
In certain embodiments, processor 142 includes built-in fault and error-correction 
functions. Cache 144 is used to store both read and write data to improve performance to 
the attached host systems, hi certain embodiments, cache 144 comprises about 4 
gigabytes. In certain embodiments, cache 144 comprises about 8 gigabytes. In certain 
embodiments, cache 144 comprises about 12 gigabytes. In certain embodiments, cache 
144 comprises about 16 gigabytes, hi certain embodhnents, cache 144 comprises about 
32 gigabytes. 

I/O portion 160 includes non-volatile storage (WS") 162 and NVS batteries 
164. NVS 162 is used to store a second copy of write data to ensure data integrity should 
there be a power failure of a subsystem failure and the cache copy of that data is lost. 
NVS 162 stores write data provided to subsystem lOlB. hi certain embodiments, NVS 
162 comprises about 1 gigabyte of storage, hi certain embodiments, NVS 162 comprises 
four separate memory cards, hi certain embodiments, each pair of NVS cards has a 
battery-powered chargmg system that protects data even if power is lost on the entire 

system for up to 72 hours. 

I/O portion 170 mcludes NVS 172 and NVS batteries 174. NVS 172 stores write 
data provided to subsystem 101 A. hi certain embodiments, NVS 172 comprises about 1 
gigabyte of storage, hi certain embodiments, NVS 172 comprises four separate memory 
cards, hi certain embodhnents, each pah of NVS cards has a battery-powered chargmg 
system that protects data even if power is lost on the enthe system for up to 72 hours. 
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In the event of a failure of subsystem lOlB, the write data for the failed 
subsystem will reside in the NVS 162 disposed in the surviving subsystem lOlA. This 
write data is then destaged at high priority to the hard disk arrays. At the same thne,A^ 

surviving subsystem 101 A will begin using NVS 162 for its own write data thereby 
5 ensuring that two copies of write data are still maintained. 

I/O portion 160 further comprises a plurality of device adapters, such as device 
adapters 165. 166, 167, and 168, and sixteen disk drives organized into two arrays, 
namely array "A" and array «B". In certain embodiments, arrays "A" and »B" utilize a 
RAID protocol. In certain embodiments, arrays "A" and "B" comprise what is 
10 sometimescalledaJBODarray,i.e.»/«5ra5««c;,0/i)/.fa''wherethearr^ 

configured according to RAID. The illustrated embodiment of FIG. 1 shows two hard 
disk arrays. In other embodiments, AppUcants' information storage and retrieval system 
includes more lhan two hard disk arrays. 

AppUcants' invention includes a method to coordinate multiple information 
15 storage and retrieval systems. FIG. 2 summarizes the steps in AppUcants' method. 
Referring now to FIG. 2, in step 205 Applicants' method provides a plurality of 
controllers and one or more intercomiected information storage and retrieval systems, 
wherein each of those information storage and retrieval systems includes one or more 
controllers. 

20 For example, the illustrated embodiment of FIG. 3 includes three (3) information 

storage and retrieval systems, namely systems 301, 331, and 361. Information storage 
and retrieval systems 301, 331, and 361, each comprise one or more I/O adapters, such as 
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VO adapters 302 / 303,1/0 adapters332/333,andI/Oadai.ers362/363,r.^^^^ 
in the illustrated embodiment of FIG. 3, information storage and retrieval systems 301, 
331, and 361, each include two subsystems, namely 301a/301b, 331a/331b, and 
36l'a/361b, respectively. Subsystems 301aand301bcommunicatewithharddi^ 

5 307and308viadeviceadapter306. Subsystems 33 la and 33 lb communicate with hard 
disk arrays 337 and 338 via device adapter 336. Subsystems 361a and 361b 
communicate with hard disk arrays 367 and 368 via device adapter 366. 

References herein to "subsystems" should not be interpreted to mean that either 
Applicants' apparatus or method is limited to information storage and retrieval systems 
10 comprisingtwo subsystems. In certain embodiments, one or more of Applicants' 
information storage and retrieval systems include a smgle system. In certain 
embodhnents, one or more of Applicants' information storage and retrieval systems 
include two subsystems. In certain embodiments, one or more of AppUcants' information 
storage and retrieval systems include more than two subsystems. 
15 Each system / subsystem includes an information cache, such as cache 305a, 

305b, 335a, 335b, 365a, and 365b. Each system / subsystem includes at least one 
controller, such as controller, 310, 320, 340, 350, 370, and 380. Each controller includes 
logic, such as logic 312, 322, 342, 352, 372, and 382. THat logic enables each of 
Applicants' controllers to function as a master controller, or as a target controller, or as 
20 both a master controller and a target controller. 

By "master controller," Applicants mean a data storage and retrieval system 
controller that receives one or more commands from one or more host computers and 



TUC920030045US1 



then issues one or more master controller commands to the other data storage and 
retrieval system controllers. By "target controller." Applicants mean a data storage and 
retrieval system controller that receives commands from either a host computer or a 
master controller, but does not issue commands to other target data storage and retrieval 

5 system controllers. 

Each controller ftirther includes a computer useable medium, such as computer 
useable media 314, 324, 344, 354, 374. and 384, having computer readable program code 
disposed therein to coordinate multiple information storage and retrieval systems as a 
master controller, or as a target controller, or as both a master controller and a target 
10 controller. In certain embodiments, each controller further includes one or more 

computer program products, such as computer program products 316, 326, 346, 356, 376, 
and 386, usable with a programmable computer processor having computer readable 
program code embodied therein method to coordinate multiple information storage and 
retrieval systems as a master controller, or as a target controUer, or as both a master 
1 5 controller and a target controller. 

In the illustrated embodiment of FIG. 3, communication link 395 intercomiects 
controllers 310, 320, 340, 350, 370, and 380. In certain embodiments, communication 
link 395 is selected from a serial intercomiection. such as RS-232 or RS^22. an ethemet 
intercomiection, a SCSI intercomiection, a Fibre Chamiel intercomiection. an ESCON 
20 intercomiection, a FICON intercomiection. a lx)cal Area Network (LAN), a private Wide 
Area Network (WAN), a public wide area network. Storage Area Network (SAN). 
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TransmissionConttolPro»colA,temetProtocol(TCT/n.)>eI«temet,«^ 
thereof. 

ControUerSlOisintercormectedwithcommimicationlirikSPS^^ 

lirics 315 and 318, bridge 304. and I/O adapter 303. Controller 320 is interconnected 
5 withconmiunicationlink395viaconm.unicationlinks315and328,bridge3^ 

adapter 303. Controller 340 is interconnected with communication link 395 via 
communication links 345 and 348, bridge 334, and I/O adapter 333. Controller 350 is 
intercomiected with communication link 395 via communications link 345 and 358. 
bridge 334. and I/O adapter 333. Controller 370 is intercomiected with communication 
10 link 395 via communication Unks 375 and 378, bridge 364, and I/O adapter 363. 

controller 380 is intercomiected with communication link 395 via communications link 
375 and 388, bridge 364, and I/O adapter 363. In certain embodiments, communication 
links 315, 318, 328, 345, 348. 358, 375, 378, and 388, are selected from a serial 
intercomiection, such as an RS-232 or an ^-422. an ethemet intercomiection, a SCSI 
15 intercom.ection,aFibreCham.elinterconnection,anESCONint^^ 

interconnection, and combinations thereot 

Referring again to FIG. 2, in step 210 each of thepl-mlity of controllers perfonm 
peer to peer rmote copy ("PPRn operations i«dq>endcntly of toe other interconnected 
storage system controllers. Referring now to FIGs. 2 and 4, information storage and 
20 letrievalsystanSOlisintercoonectedwithremotestoragelocationlOlvi. 

o^mtunication link 410. taformation storage and retrieval system 331 is interconnected 
withremote storage location 431 via commnnication link 430. ^formation storage and 
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retrieval system 361 is interconnected with remote storage location 461 via 
communication link 460. In certain embodiments, communication links 410, 430, and 
460. are each selected from a serial intercomiection, such as RS-232 or RS-422, an 
ethemet intercomiection, a SCSI intercomiection. a Fibre Chamiel intercomiection, an 
ESCON intercomiection, a FICON intercomiection, a Lx)cal Area Network (LAN), a 
private Wide Area Network (WAN), a public wide area network. Storage Area Network 
(SAN). Transmission Control Protocol/Internet Protocol (TCP/IP), the Internet, and 

combinations thereof. 

A host computer, such as host 390 (FIGs. 3, 4), provides information and a write 
command toaprimary storage location, such as subsystem 301a(Fia3)disposed in data 

storage and retrieval system 301 (FIGs. 3, 4). Using one or more algorithms disposed in 
logic 312 (FIG. 3), controller 310 provides the information from a first information 
storage medium 305a to a second information storage medium 405 disposed in remote 
storage location 401 . In certain embodiments, information storage medium 305a 
comprises a data cache, hi certain embodiments, information storage medmm 305a 
comprises a DASD. In certam embodiments, information storage medium 405 comprises 
a data cache, hi certam embodiments, information storage medium 405 comprises a 
DASD. Sunilarly. confroUers 320. 340. 350, 370, and 380, independently perform PPRC 
operations as instructed from one or more host computers. 

fa step 220, Applicants' method designates one of the plurality of controllers as a 
master confroller. For example, in the iUusfrated embodiments of FIGs. 3 and 4, 
Applicants' method in step 220 selects one of controllers 310, 320, 340. 350, 370, or 380, 
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as a master controller. In certain embodiments, step 220 is performed by a host 
computer, such as host computer 390 (FIGs. 3, 4). In certain embodiments, step 220 is 
performed by an application nmning on a host computer, such as application 392 (FIG. 
3). In certain embodiments, step 220 is performed by a controller disposed in the host 
computer, such as controller 396. 

In step 230, Applicants' method provides a host command policy to the master 
controller selected in step 220. hi certain embodiments, step 230 is performed by a host 
computer, such as host computer 390 (FIGs. 3, 4). hi certain embodiments, step 230 is 
performed by an application running on a host computer, such as application 392 (FIG. 
3). hi certain embodiments, step 230 is performed by a controller disposed in the host 

computer, such as controller 396. 

M step 240, Applicants' method at a first tune provides one or more first master 
controller commands to each target controller, i.e. each controller not designated as the 
master controller. M certain embodiments, the one or more first master controller 
commands include initial setup and configuration commands, including a designation of 
the master controller and the target controllers, hi certain embodiments, the master 
controller simultaneously provides the one or more first master controller commands to 

each target controller. 

hi other embodmients, in step 240 the master controller provides the one or more 
first master controller commands to a first target controller, and that first target controller 
relays those one or more first master controller commands to a second target controller. 
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In these embodiments, the one or more first master controller commands of step 240 are 
provided sequentially to each of the target controllers. 

For example and referring to FIGs. 3 and 4, if Applicants' method designates 
controller 310 as the master controller m step 220, then in step 240 controller 3 10 
provides a first set of master controller commands to controllers 320, 340, 350, 370, and 
380. In this example using the Ulustrated embodiments of FIGs. 3 and 4, the one or more 
first master controller commands of step 240 indicate that controller 310 is designated the 
master controller and that controllers 320, 340, 350, 370, and 380, are designated target 
controllers. 

Using Applicants' apparatus and method, there is no single point of failure 
regarding the designation of, and performance by, the master controller. For example in 
certain embodiments, the designated master controller is disposed in a first information 
storage and retrieval system. Another controller is disposed in that first information 
storage and retrieval system, or in another information storage and retrieval system, hi 
the event the master controller becomes non^perational. the other controller performs the 
fimctions of the master controller. 

In certain embodiments, that other controller monitors the operation of the master 
controller, determines if the master controller is operational, and in the event the master 
controller is not operational designates itself as the master controller. In certain 
embodiments, the other controller is one of the designated target controllers, hi other 
embodiments, the other controller is not one of the designated target controllers. 
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For example, if designated master controller, namely controller 310, is disposed in 
system 301. System 301 includes two subsystems, namely subsystems 301a and 301b. 
Master controller 3 10 is disposed in subsystems 301a. Target controller 320 is disposed 
on subsystems 301b. Target controller 320 continuously monitors the operation of 
master controller 3 10. In certain embodiments, at regular intervals target controller 320 
sends a "heart beat" signal to master controller 310. Upon receiving that heart beat 
signal, master controller 3 1 0 sends a responding heart beat signal to target controller 310. 

If target controller 320 receives a responding heart beat signal from master 
controller 310, then target controller 320 determines that master controller 310 is 
operational. Alternatively, if target controller 320 does not receive a responding heart 
beat signal from master controller 3 1 0, then target controUer 320 detennines that master 
controller 3 1 0 is no longer operational. In the event master controller 310 becomes non- 
operational, target controller 320 immediately designates itself the master controller, and 
performs the functions of the master controller thereafter. 

Neither host 390, nor the remaining target controllers 340, 350, 370, or 380, are 
notified that controller 320 is now functioning as the master controller. Thus, 
Applicants' method provides transparent failover protection in the event a designated 
master controller becomes non-operational. 

In step 250, Applicants' method provides at a second time one or more second 
master controller commands to each of the target controllers. Step 250 is performed by 
the designated master confroUer. In certain embodiments, the one or more second master 
confroUer commands cause each of the target controllers to adjust the flow of data into 
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and/or from the one or more infonnation storage and retrieval systems. In certain 
embodiments, the one or more second master controller commands of step 250 include 
one or more commands that cause each target controller to stop accepting write 
operations from the one or more host computers, m certain embodiments, the one or 
more second master controller commands of step 250 include one or more commands 
that cause each target controller to stop sending data to one or more remote storage 
locations. ]n certain embodiments, the one or more second master controller commands 
of step 250 include one or more commands that cause each target controller to resume 
sending data to the one or more remote storage locations. In certain embodiments, the 
one or more second master controller commands of step 250 include one or more 
commands that cause each target controller to form one or more consistency groups. 

Applicants' method transitions from step 250 to step 260 wherein all the 
controllers, including the master controller, form one or more consistency groups. Thus, 
in step 260 the master controller issues commands to the target controllers to form one or 
more consistency groups, and causes itself to form one or more consistency groups. In 
essence, the master controller is functioning both as a master controUer and as a target 

controller in step 260. 

As those skilled in the art will appreciate, volumes in the primary and secondary 
DASDs are "consistent" when all writes have been transferred in their logical order, i.e., 
all earlier writes fransferred first before their corresponding dependent writes. In a 
bankmg example, this means that an earlier-in-time $400 deposit is written to the 
secondary volume before a later-in- time $300 withdrawal. By "consistency group," 
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Appuoanume^acoueotionofupdateswthe prima., vota«,i.e.ttie first irfbn^ 
^ m DASD. 305a, 305b, 335a, 335b, 365a. and 365b, s«d> that dependeot writes are 
secured in a consistent manner. In the banking example, this means that the withdrawal 
ttansaction is to the same consistency g«>np as the deposit or in a later group; the 
5 wiflKirawal cannot be in an earlier consistency group. Consistency groups mantain data 
consistency »=.»ss volumes and storage devices. If a Mure occurs, consistency groups 
ensureftatdataisrecoveredhomfhesecondaryvolumeswUlbeconsistent. Formation 
of comistency groups is described in United States P.tentNos.6,484,187;5,615,329; and 

5,504,861, which are assigned to IBM and incorporated herein by reference to their 

10 entirety. 

Applicants' method transitions from step 260 to step 270 wherein each target 
controUer provides status information to the master controller. In certain embodiments, 
the status information of step 270comprisesaflagwWch the target controll^ 
oneormoreconsistency groups were formedinstep 260. m certain embodhnents. the 
15 statusinformationofstep270comprisesabyteoraframewWchthetargetcontroll« 

to 1 if one or more consistency groups were formed in step 260. 

Applicants' method transitions from step 270 to step 250 and continues. 
In certain embodiments, individual steps recited in FIG. 2 may be combined, 

eliminated, or reordered. 

Applicants' invention further includes an article of manufacture comprising a 
computer useable medium, such as computer useable media 314, (FIG. 3), 324 (FIG. 3), 
344 (FIG. 3), 354 (FIG. 3), 374 (FIG. 3), and/or 384 (FIG. 3), having computer readable 
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program code disposed therein to implement Applicants' method to coordinate multiple 
information storage and retrieval systems. In certain embodiments, the computer useable 
medium having computer readable program code disposed therein implements one or 

more steps recited in FIG, 2. 

Applicants' invention further includes a computer program product, such as 
computer program products 316 (FIG. 3), 326 (FIG. 3), 346 (FIG. 3). 356 (FIG. 3), 376 
(FIG. 3), and/or 386 (FIG. 3), usable with a programmable computer processor having 
computer readable program code embodied therein to implement Applicants' method to 
coordinate multiple information storage and retrieval systems. In certain embodiments, 
the computer program code implements one or more steps recited in FIG. 2. 

While the preferred embodiments of the present invention have been Ulustrated in 
detail, it should be apparent that modifications and adaptations to those embodiments 
may occur to one skilled in the art without departing from the scope of the present 
invention as set forth in the following claims. 
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